Skip to content

Add Azure Kubernetes Service (AKS) hosting support#16088

Merged
mitchdenny merged 78 commits intomainfrom
feature/aks-support
Apr 17, 2026
Merged

Add Azure Kubernetes Service (AKS) hosting support#16088
mitchdenny merged 78 commits intomainfrom
feature/aks-support

Conversation

@mitchdenny
Copy link
Copy Markdown
Member

Description

WIP — Adds first-class Azure Kubernetes Service (AKS) support to Aspire via a new Aspire.Hosting.Azure.Kubernetes package.

Motivation

Aspire's Aspire.Hosting.Kubernetes package supports end-to-end deployment to any conformant Kubernetes cluster via Helm charts, but it has no awareness of Azure-specific capabilities. Users who deploy to AKS must manually provision the cluster, configure workload identity, set up monitoring, and manage networking outside of Aspire.

What's here so far (Phase 1)

  • New Aspire.Hosting.Azure.Kubernetes package with dependencies on Aspire.Hosting.Kubernetes and Aspire.Hosting.Azure
  • AzureKubernetesEnvironmentResource — unified resource that extends AzureProvisioningResource and implements IAzureComputeEnvironmentResource, internally wrapping a KubernetesEnvironmentResource for Helm deployment
  • AddAzureKubernetesEnvironment() entry point (mirrors AddAzureContainerAppEnvironment() pattern)
  • Configuration extensions: WithVersion, WithSkuTier, WithNodePool, AsPrivateCluster, WithContainerInsights, WithAzureLogAnalyticsWorkspace
  • AzureKubernetesInfrastructure eventing subscriber
  • Implementation spec at docs/specs/aks-support.md

What's planned next

  • Workload identity (federated credentials + ServiceAccount YAML generation)
  • VNet integration (WithDelegatedSubnet)
  • Full Bicep provisioning (pending Azure.Provisioning.ContainerService package availability in internal feeds)
  • Unit tests with Bicep snapshot verification
  • E2E deployment tests

Validation

  • Package builds successfully with dotnet build /p:SkipNativeBuild=true
  • Follows established patterns from Aspire.Hosting.Azure.AppContainers

Fixes # (issue)

Checklist

  • Is this feature complete?
    • Yes. Ready to ship.
    • No. Follow-up changes expected.
  • Are you including unit tests for the changes and scenario tests if relevant?
    • Yes
    • No
  • Did you add public API?
    • Yes
      • If yes, did you have an API Review for it?
        • Yes
        • No
      • Did you add <remarks /> and <code /> elements on your triple slash comments?
        • Yes
        • No
    • No
  • Does the change make any security assumptions or guarantees?
    • Yes
    • No
  • Does the change require an update in our Aspire docs?

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 12, 2026

🚀 Dogfood this PR with:

⚠️ WARNING: Do not do this without first carefully reviewing the code of this PR to satisfy yourself it is safe.

curl -fsSL https://raw.githubusercontent.com/microsoft/aspire/main/eng/scripts/get-aspire-cli-pr.sh | bash -s -- 16088

Or

  • Run remotely in PowerShell:
iex "& { $(irm https://raw.githubusercontent.com/microsoft/aspire/main/eng/scripts/get-aspire-cli-pr.ps1) } 16088"

@github-actions
Copy link
Copy Markdown
Contributor

Re-running the failed jobs in the CI workflow for this pull request because 1 job was identified as retry-safe transient failures in the CI run attempt.
GitHub was asked to rerun all failed jobs for that attempt, and the rerun is being tracked in the rerun attempt.
The job links below point to the failed attempt jobs that matched the retry-safe transient failure rules.

Comment thread src/Aspire.Hosting.Azure.Kubernetes/AzureKubernetesEnvironmentExtensions.cs Outdated
Comment thread docs/specs/aks-support.md
@github-actions
Copy link
Copy Markdown
Contributor

Re-running the failed jobs in the CI workflow for this pull request because 1 job was identified as retry-safe transient failures in the CI run attempt.
GitHub was asked to rerun all failed jobs for that attempt, and the rerun is being tracked in the rerun attempt.
The job links below point to the failed attempt jobs that matched the retry-safe transient failure rules.

Copy link
Copy Markdown
Member Author

@mitchdenny mitchdenny left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code review: found 7 issues — 2 bugs (deadlock + missing exit code check), 1 security concern (credential file leak), 2 correctness issues (orphaned resources, redundant allocation), 1 behavioral concern (FindNodePoolResource identity), 1 documentation gap (region-locked VM sizes).

Comment thread src/Aspire.Hosting.Azure.Kubernetes/AzureKubernetesInfrastructure.cs Outdated
Comment thread src/Aspire.Hosting.Azure.Kubernetes/AzureKubernetesInfrastructure.cs Outdated
Comment thread src/Aspire.Hosting.Azure.Kubernetes/AzureKubernetesInfrastructure.cs Outdated
Comment thread src/Aspire.Hosting.Azure.Kubernetes/AzureKubernetesInfrastructure.cs Outdated
Comment thread src/Aspire.Hosting.Kubernetes/KubernetesResource.cs Outdated
Comment thread src/Aspire.Hosting.Azure.Kubernetes/tools/GenVmSizes.cs Outdated
Comment thread src/Aspire.Hosting.Azure.Kubernetes/AzureKubernetesEnvironmentResource.cs Outdated
Comment thread src/Aspire.Hosting.Azure.Kubernetes/AzureKubernetesEnvironmentResource.cs Outdated
Comment thread src/Aspire.Hosting.Azure.Kubernetes/AzureKubernetesEnvironmentResource.cs Outdated
Comment thread src/Aspire.Hosting.Azure.Kubernetes/AzureKubernetesEnvironmentResource.cs Outdated
Comment thread src/Aspire.Hosting.Azure.Kubernetes/AksNodeVmSizes.Generated.cs
@mitchdenny mitchdenny marked this pull request as ready for review April 15, 2026 02:33
@mitchdenny mitchdenny self-assigned this Apr 15, 2026
@mitchdenny mitchdenny added this to the 13.3 milestone Apr 15, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Adds first-class Azure Kubernetes Service (AKS) support to Aspire via a new Aspire.Hosting.Azure.Kubernetes package, integrating Azure provisioning (Bicep via Azure.Provisioning) with the existing Helm-based Kubernetes publishing pipeline.

Changes:

  • Introduces AddAzureKubernetesEnvironment() and AKS resource types (AKS cluster, node pools, subnet/workload identity wiring) plus an AKS-specific infrastructure subscriber.
  • Extends Kubernetes publishing to support parent compute environments (AKS wrapping an inner Kubernetes environment), node pool scheduling, kubeconfig targeting, and deploy-time IValueProvider resolution for Helm values.
  • Adds tests (including Bicep snapshot verification) and automation to periodically update VM size constants.

Reviewed changes

Copilot reviewed 31 out of 31 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
tests/Aspire.Hosting.Azure.Kubernetes.Tests/Snapshots/AzureKubernetesEnvironmentExtensionsTests.AddAzureKubernetesEnvironment_WithVersion.verified.bicep Adds verified Bicep snapshot for AKS version configuration.
tests/Aspire.Hosting.Azure.Kubernetes.Tests/Snapshots/AzureKubernetesEnvironmentExtensionsTests.AddAzureKubernetesEnvironment_BasicConfiguration.verified.bicep Adds verified Bicep snapshot for basic AKS provisioning outputs.
tests/Aspire.Hosting.Azure.Kubernetes.Tests/AzureKubernetesInfrastructureTests.cs Adds tests for AKS infra behavior (default user pool, affinity, deployment target/registry flow).
tests/Aspire.Hosting.Azure.Kubernetes.Tests/AzureKubernetesEnvironmentExtensionsTests.cs Adds API and configuration tests for AKS environment extensions and Bicep generation.
tests/Aspire.Hosting.Azure.Kubernetes.Tests/Aspire.Hosting.Azure.Kubernetes.Tests.csproj Introduces new test project for AKS hosting package.
src/Aspire.Hosting.Kubernetes/Resources/ServiceAccountV1.cs Adds Kubernetes ServiceAccount resource model for YAML generation (workload identity).
src/Aspire.Hosting.Kubernetes/KubernetesResource.cs Adds support for deferring Helm value resolution via IValueProvider and normalizes expression keys.
src/Aspire.Hosting.Kubernetes/KubernetesPublishingContext.cs Adds parent-environment matching, node pool nodeSelector application, and capturing deploy-time value providers.
src/Aspire.Hosting.Kubernetes/KubernetesNodePoolResource.cs Introduces node pool resource abstraction for scheduling workloads in Kubernetes environments.
src/Aspire.Hosting.Kubernetes/KubernetesNodePoolAnnotation.cs Adds annotation to associate compute resources with a node pool for scheduling.
src/Aspire.Hosting.Kubernetes/KubernetesInfrastructure.cs Enables parent environment targeting and sets deployment target annotations accordingly.
src/Aspire.Hosting.Kubernetes/KubernetesEnvironmentResource.cs Adds kubeconfig path support, parent compute env linkage, and captured Helm value providers list.
src/Aspire.Hosting.Kubernetes/KubernetesEnvironmentExtensions.cs Adds public node pool API (AddNodePool, WithNodePool) for Kubernetes environments.
src/Aspire.Hosting.Kubernetes/Deployment/HelmDeploymentEngine.cs Adds deploy-time IValueProvider resolution, and passes --kubeconfig to helm/kubectl when set.
src/Aspire.Hosting.Kubernetes/Aspire.Hosting.Kubernetes.csproj Exposes internals to AKS package and its tests.
src/Aspire.Hosting.Azure.Kubernetes/tools/GenVmSizes.cs Adds tool to generate AKS VM size constants from Azure SKU data.
src/Aspire.Hosting.Azure.Kubernetes/README.md Adds initial package README and basic usage snippet.
src/Aspire.Hosting.Azure.Kubernetes/AzureKubernetesInfrastructure.cs Adds AKS event subscriber to fetch kubeconfig, ensure user pool, and wire workload identity/SA resources.
src/Aspire.Hosting.Azure.Kubernetes/AzureKubernetesEnvironmentResource.cs Adds unified AKS provisioning + compute environment resource wrapping inner Kubernetes environment.
src/Aspire.Hosting.Azure.Kubernetes/AzureKubernetesEnvironmentExtensions.cs Adds AddAzureKubernetesEnvironment plus configuration extensions (version, tier, subnet, node pools, registry, identity).
src/Aspire.Hosting.Azure.Kubernetes/Aspire.Hosting.Azure.Kubernetes.csproj Introduces new AKS hosting package project and dependencies.
src/Aspire.Hosting.Azure.Kubernetes/AksSubnetAnnotation.cs Adds annotation to track AKS subnet references for Bicep wiring.
src/Aspire.Hosting.Azure.Kubernetes/AksSkuTier.cs Adds AKS SKU tier enum.
src/Aspire.Hosting.Azure.Kubernetes/AksNodeVmSizes.Generated.cs Adds generated VM size constants consumed by node pool configuration.
src/Aspire.Hosting.Azure.Kubernetes/AksNodePoolResource.cs Adds AKS-specific node pool resource extending kubernetes node pool abstraction.
src/Aspire.Hosting.Azure.Kubernetes/AksNodePoolConfig.cs Adds node pool config record + pool mode enum (System/User).
src/Aspire.Hosting.Azure.Kubernetes/AksNetworkProfile.cs Adds internal network profile model for AKS network settings.
docs/specs/aks-support.md Adds implementation spec describing architecture, phases, and design decisions.
Directory.Packages.props Adds Azure.Provisioning.ContainerService dependency version.
Aspire.slnx Adds AKS package and test projects to the solution.
.github/workflows/update-azure-vm-sizes.yml Adds scheduled workflow to regenerate VM size constants and open a PR.

Comment on lines +248 to +252
// Get the actual provisioned cluster name from the Bicep output.
// The Azure.Provisioning SDK may add a unique suffix to the name
// (e.g., take('aks-${uniqueString(resourceGroup().id)}', 63)).
var clusterName = await environment.NameOutputReference.GetValueAsync(context.CancellationToken).ConfigureAwait(false)
?? environment.Name;
Copy link

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

environment.Name is the Aspire resource name (e.g., "aks"), but the provisioned AKS cluster name in Bicep is generated (see snapshots: take('aks-${uniqueString(resourceGroup().id)}', 63)). This will cause az aks get-credentials (and the resource group lookup) to target a non-existent cluster. Use the provisioned cluster name output instead (e.g., resolve environment.NameOutputReference after provisioning) and pass that resolved name through to GetResourceGroupAsync / az aks get-credentials.

Copilot uses AI. Check for mistakes.
// Scope the Helm chart name to this AKS environment to avoid
// conflicts when multiple environments deploy to the same cluster
// or when re-deploying with different environment names.
k8sEnvBuilder.Resource.HelmChartName = $"{builder.Environment.ApplicationName}-{name}".ToLowerInvariant().Replace(' ', '-');
Copy link

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Helm chart names have stricter character constraints than just lowercasing and replacing spaces; application names can include underscores or other characters that may result in an invalid chart/release name. Prefer using the existing Helm/Kubernetes naming helper used elsewhere in the repo (e.g., a ToHelmChartName()-style sanitizer) to guarantee a valid name.

Suggested change
k8sEnvBuilder.Resource.HelmChartName = $"{builder.Environment.ApplicationName}-{name}".ToLowerInvariant().Replace(' ', '-');
k8sEnvBuilder.Resource.HelmChartName = $"{builder.Environment.ApplicationName}-{name}".ToHelmChartName();

Copilot uses AI. Check for mistakes.
Comment on lines +171 to +172
var defaultPool = new AksNodePoolResource("workload", defaultConfig, environment);
appModel.Resources.Add(defaultPool);
Copy link

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This adds an AksNodePoolResource directly to the model, but node pools added via AddNodePool(...) are excluded from the manifest in publish mode. Adding the default pool without excluding it can make manifests/publishing output inconsistent (and can expose an implementation detail users didn’t declare). Consider marking this default node pool as excluded-from-manifest using the same mechanism used by the AddNodePool(...) path, or avoid adding it as a standalone resource if it’s only needed to attach KubernetesNodePoolAnnotation.

Copilot uses AI. Check for mistakes.
## Usage example

Then, in the _AppHost.cs_ file of `AppHost`, add an AKS environment and deploy services to it:

Copy link

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The usage example doesn’t show how to actually target AKS for deployment. As written, myService is not assigned to the AKS compute environment (it likely needs .WithComputeEnvironment(aks) or similar), so users may copy/paste a non-working example.

Copilot uses AI. Check for mistakes.
Comment thread .github/workflows/update-azure-vm-sizes.yml Outdated
mitchdenny and others added 26 commits April 17, 2026 10:06
- r1: Add explanatory comments to #pragma warning disable directives
  explaining why each experimental API suppression is needed
- r11: Remove DelegatedSubnetAnnotation fallback in Bicep generation.
  AKS uses plain subnets (WithSubnet), not delegated subnets. There
  is no legacy path to support.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Replace all AspireExportIgnore attributes with proper AspireExport
attributes so AKS APIs are available in TypeScript-based AppHosts.

Exported methods: AddAzureKubernetesEnvironment, WithVersion,
WithSkuTier, AddNodePool, AsPrivateCluster, WithSubnet (env + pool),
WithContainerRegistry, WithContainerInsights,
WithAzureLogAnalyticsWorkspace, WithWorkloadIdentity.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Replace raw System.Diagnostics.Process usage with the shared
ProcessSpec/ProcessUtil pattern used by Aspire.Hosting.Azure.

- Use PathLookupHelper.FindFullPathFromPath('az') instead of custom
  FindAzCli() with hardcoded Windows paths
- Use ProcessSpec + ProcessUtil.Run for process execution — handles
  stdout/stderr via callbacks (no deadlock risk), manages process
  lifecycle via IAsyncDisposable
- Extract RunAzCommandAsync helper returning structured result with
  ExitCode, StandardOutput, StandardError
- Check exit code on az resource list (was silently ignored)
- Link ProcessSpec.cs, ProcessUtil.cs, ProcessResult.cs,
  PathLookupHelper.cs into AKS csproj

Fixes review items: r3 (deadlock), r4 (exit code), r9 part (FindAzCli)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Replace Directory.CreateTempSubdirectory with IFileSystemService
.TempDirectory.CreateTempSubdirectory so the kubeconfig temp directory
is tracked and automatically cleaned up when the DI container disposes.
This ensures AKS cluster credentials don't persist on disk after the
pipeline completes.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Use 'az aks get-credentials --file -' to capture kubeconfig content
to stdout rather than letting az CLI write the file directly. This
avoids the az CLI potentially creating the file with permissive
permissions on shared /tmp.

Write the kubeconfig content ourselves to the managed temp directory,
and on Unix set restrictive file permissions (0600 owner-only) via
File.SetUnixFileMode. The temp directory is still managed by
IFileSystemService and auto-cleaned on dispose.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Use object initializer with 'parameter as IValueProvider' instead of
creating a HelmValue, then conditionally creating an identical one
just to set the init-only ValueProviderSource property.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
r5: FindNodePoolResource now searches the app model for existing
AksNodePoolResource instances by name and parent, preserving object
identity with pools created via AddNodePool(). Falls back to creating
a new resource only for pools added via config but not via the API.

r8: Default 'workload' node pool is now added to the app model via
appModel.Resources.Add() so it appears in manifests and pipelines,
matching the behavior of pools created via AddNodePool().

r12: Update AKS API version from 2024-06-02-preview to 2026-01-01
(current GA version).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Replace hand-crafted StringBuilder Bicep generation with the
Azure.Provisioning.ContainerService SDK types:

- ContainerServiceManagedCluster with V2025_03_01 API version
- ManagedClusterAgentPoolProfile for node pools
- ContainerServiceNetworkProfile for networking
- ManagedClusterSecurityProfile for workload identity
- ManagedClusterOidcIssuerProfile for OIDC
- FederatedIdentityCredential from Azure.Provisioning.Roles
- ContainerRegistryService.FromExisting for ACR references
- RoleAssignment for AcrPull role

Remove GetBicepTemplateString/GetBicepTemplateFile overrides from
AzureKubernetesEnvironmentResource — base class handles it via
the ConfigureInfrastructure callback.

Also resolves r13 (hardcoded serviceCidr/dnsServiceIP) — when a
subnet is configured, Azure CNI is set without hardcoded CIDRs,
letting AKS use its own defaults.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The Azure.Provisioning SDK generates cluster names with a unique
suffix: take('aks-\', 63)

Previously we used environment.Name directly ('aks') which didn't
match the actual provisioned name, causing 'az resource list' to
return empty and the get-credentials step to fail.

Fix: use NameOutputReference.GetValueAsync() to get the actual
provisioned name from the Bicep output. This runs after provisioning
completes, so the output value is available.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Use ToHelmChartName() for Helm chart name sanitization instead of
  manual ToLowerInvariant/Replace (handles underscores, dots, etc.)
- Exclude default 'workload' node pool from manifest via
  ManifestPublishingCallbackAnnotation.Ignore (consistent with
  AddNodePool which calls ExcludeFromManifest)
- Fix workflow body to reference AksNodeVmSizes.Generated.cs (was
  stale AzureVmSizes.Generated.cs)
- Add WithComputeEnvironment(aks) to README usage example

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add 'text' language to fenced code blocks containing ASCII art
diagrams (MD040 fenced-code-language).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- AddNodePool: validate minCount >= 0, maxCount >= 0, minCount <= maxCount
- WithSubnet: use ResourceAnnotationMutationBehavior.Replace to prevent
  silent overwrites when called multiple times
- Federated credential: resolve namespace from KubernetesNamespaceAnnotation
  instead of hardcoding 'default'. Falls back to 'default' when not set
  or when namespace is parameter-based (Azure AD needs fixed subject)
- Remove AllowUnsafeBlocks: use IProcessRunner from DI (via IVT from
  Aspire.Hosting.Azure) instead of linking ProcessUtil.cs directly
- GenVmSizes.cs: read stdout and stderr concurrently to avoid deadlock
- Snapshot files: add trailing newlines

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
These are infrastructure configuration concerns better handled via
ConfigureInfrastructure(...) customization. The internal properties
and Bicep generation logic remain for users who customize directly.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
End-to-end test that deploys the Aspire starter template to AKS using
the full aspire deploy pipeline (Bicep provisioning + container build +
ACR push + Helm deploy). Follows the ACA deployment test pattern.

Test flow:
1. Create starter project via aspire new
2. Add Aspire.Hosting.Azure.Kubernetes package
3. Modify AppHost to use AddAzureKubernetesEnvironment + WithComputeEnvironment
4. aspire deploy --clear-cache (provisions AKS + ACR + deploys)
5. Verify pods running via kubectl
6. Port-forward and verify HTTP endpoints
7. aspire destroy for cleanup

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
T1.2: WithVersion - deploys with K8s 1.30, verifies via kubectl
T1.3: NodePool - custom pool with nodeSelector verification
T1.4: VNet - subnet integration with VNet IP verification
T1.5: WorkloadIdentity - Azure Storage ref with WI SA/pod labels
T1.6: ExplicitRegistry - bring-your-own ACR
T1.7: PerPoolSubnet - different subnets per node pool

All tests use aspire deploy --clear-cache (full provisioning pipeline)
and follow the same pattern as T1.1.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
TypeScript variants of the AKS Tier 1 tests using ExpressReact template:
- TypeScriptAksDeploymentTests: basic AKS deploy from TS AppHost
- TypeScriptAksNodePoolDeploymentTests: custom node pool from TS
- TypeScriptAksVnetDeploymentTests: VNet/subnet integration from TS

Uses addAzureKubernetesEnvironment(), addNodePool(), withSubnet() from
the auto-generated TypeScript SDK (via AspireExport attributes).
Follows TypeScriptExpressDeploymentTests pattern with bundle install.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Same rationale as AsPrivateCluster and WithSkuTier: Kubernetes version
is an infrastructure configuration concern. The internal property and
Bicep generation remain for ConfigureInfrastructure customization.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…Workspace

These APIs set internal properties but the ConfigureAksInfrastructure
callback never emits the corresponding Bicep (addonProfiles.omsagent,
azureMonitorProfile, data collection rules). Shipping non-functional
APIs is misleading.

Follow-up issue #16150 will add these back when Bicep generation is
implemented. Internal properties remain for future use.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Root cause: string replacement (content.Replace(oldCode, newCode))
failed silently due to line ending mismatches between the template
output and our raw string literals. aspire deploy completed in 97ms
as a no-op because the AKS environment was never added.

Fix: Write the ENTIRE AppHost.cs content instead of patching it.
This is immune to line ending, whitespace, and template changes.
Each test now has a self-documenting raw string literal showing
exactly what AppHost code is being tested.

TypeScript tests: added guard checks to throw if the apphost.ts
replacement didn't change anything.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Default system pool: Standard_D4s_v5 (4 vCPUs) -> Standard_D2s_v5 (2 vCPUs)
Default workload pool: Standard_D4s_v5 (4 vCPUs) -> Standard_D2s_v5 (2 vCPUs)
Max workload pool: 10 -> 3 (reduces quota reservation)

Total minimum vCPU: 8 -> 4 (fits within CI subscription quota)

The deployment tests were failing with:
  ErrCode_InsufficientVCPUQuota: Insufficient vcpu quota requested 8,
  remaining 0 for family standardDSv5Family for region westus3

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
1. Remove IVT from Aspire.Hosting.Azure -> Aspire.Hosting.Azure.Kubernetes.
   Revert to linking ProcessSpec/ProcessUtil/ProcessResult directly
   (same pattern as Aspire.Hosting.Azure itself).

2. Rename ParentComputeEnvironment -> OwningComputeEnvironment per Eric's
   suggestion. Better describes the ownership relationship.

3. Remove all 9 new AKS E2E deployment tests due to capacity issues in
   the deployment test subscription. The existing AksStarter* tests remain.
   Will re-add verification tests in a follow-up.

4. Add Helm CLI prerequisite check pipeline step. Fails fast with clear
   error message if helm is not on PATH, before any deployment steps run.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
vmSize defaults to Standard_D2s_v5, minCount to 1, maxCount to 3.
ARM/Bicep requires vmSize (no Azure default), so we provide a sensible
default. Users can now call just aks.AddNodePool("workload") for the
common case.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Stop setting DASHBOARD__FRONTEND__AUTHMODE and DASHBOARD__OTLP__AUTHMODE
to 'Unsecured' on the Aspire dashboard deployed to Kubernetes. This
matches the Docker Compose behavior where the dashboard uses its default
auth mode (BrowserToken).

Update snapshot tests for environment resource tests. Publisher test
snapshots need regeneration (dashboard ConfigMap removed, file numbering
shifts).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
… arg validation, update spec

- Fix FindNodePoolResource to add ManifestPublishingCallbackAnnotation.Ignore
- Add defense-in-depth argument validation for az CLI commands
- Update spec to reflect removed WithVersion/WithSkuTier/AsPrivateCluster APIs

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Shows 'kubectl logs' command in deployment summary so users can
retrieve the dashboard login token after deployment.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The bare 'app' label doesn't exist on pods - use the standard
Kubernetes label 'app.kubernetes.io/component' which matches
the actual deployment labels.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copy link
Copy Markdown
Member

@JamesNK JamesNK left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3 issues found: 1 security (CLI argument validation ordering), 2 correctness (stale annotation on registry swap, unconditional Bicep outputs).


var azPath = FindAzCli();
var resourceGroup = await GetResourceGroupAsync(azPath, clusterName, context)
.ConfigureAwait(false);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Security: Cluster name used in CLI command before validation.

clusterName is passed to GetResourceGroupAsync (which runs az resource list --name "{clusterName}") on this line, but ValidateAzureResourceName(clusterName, ...) isn't called until line 263 — after the CLI command has already executed. If the Bicep output contains unexpected characters, the unvalidated value is used in a shell command.

Move the validation before GetResourceGroupAsync:

var clusterName = await environment.NameOutputReference.GetValueAsync(context.CancellationToken).ConfigureAwait(false)
    ?? environment.Name;

var azPath = FindAzCli();

// Validate BEFORE using in CLI commands
ValidateAzureResourceName(clusterName, "cluster name");

var resourceGroup = await GetResourceGroupAsync(azPath, clusterName, context)
    .ConfigureAwait(false);

ValidateAzureResourceName(resourceGroup, "resource group");

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch — fixed. Moved ValidateAzureResourceName(clusterName, ...) before GetResourceGroupAsync so the cluster name is validated before it is used in any CLI command. resourceGroup is validated immediately after it is resolved.

builder.WithAnnotation(new ContainerRegistryReferenceAnnotation(registry.Resource));
builder.Resource.KubernetesEnvironment.Annotations.Add(
new ContainerRegistryReferenceAnnotation(registry.Resource));

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: WithContainerRegistry leaves stale ContainerRegistryReferenceAnnotation on inner K8s environment.

When WithContainerRegistry replaces the default ACR, it adds a new ContainerRegistryReferenceAnnotation to the inner K8s environment but doesn't remove the old annotation (added during AddAzureKubernetesEnvironment) that still references the now-removed default ACR resource. TryGetLastAnnotation returns the newest, so this works today, but the stale annotation referencing a removed resource could cause issues if annotation iteration behavior changes.

Consider removing the old annotation before adding the new one:

// Remove stale annotation referencing the default ACR
var staleAnnotations = builder.Resource.KubernetesEnvironment.Annotations
    .OfType<ContainerRegistryReferenceAnnotation>().ToList();
foreach (var old in staleAnnotations)
{
    builder.Resource.KubernetesEnvironment.Annotations.Remove(old);
}
builder.Resource.KubernetesEnvironment.Annotations.Add(
    new ContainerRegistryReferenceAnnotation(registry.Resource));

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — WithContainerRegistry now removes all existing ContainerRegistryReferenceAnnotation instances from the inner K8s environment before adding the new one.

Value = new MemberExpression(
new MemberExpression(new MemberExpression(aksId, "properties"), "oidcIssuerProfile"),
"issuerURL")
});
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Bicep outputs reference properties that may not exist when OIDC/workload identity is disabled.

The oidcIssuerUrl and kubeletIdentityObjectId outputs are always emitted unconditionally, but they reference aks.properties.oidcIssuerProfile.issuerURL and aks.properties.identityProfile.kubeletidentity.objectId respectively. If a user overrides the configuration via ConfigureInfrastructure() to disable OIDC issuer or uses a user-assigned identity instead of system-assigned, these property paths may not exist, causing the Bicep deployment to fail at ARM evaluation time.

Consider making these outputs conditional on the features being enabled:

if (aksResource.OidcIssuerEnabled)
{
    infrastructure.Add(new ProvisioningOutput("oidcIssuerUrl", typeof(string))
    {
        Value = new MemberExpression(...)
    });
}

Low risk since both default to true, but a correctness concern for users who customize infrastructure.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — oidcIssuerUrl output is now conditional on OidcIssuerEnabled. Note that OidcIssuerEnabled defaults to true so this only matters if someone explicitly disables it via ConfigureInfrastructure(). The kubeletIdentityObjectId output is always needed (system-assigned identity is always present) so it remains unconditional.

mitchdenny and others added 2 commits April 17, 2026 11:47
…nditional outputs

- Move ValidateAzureResourceName before GetResourceGroupAsync so cluster
  name is validated before use in az CLI commands
- Remove stale ContainerRegistryReferenceAnnotation from inner K8s
  environment when WithContainerRegistry replaces the default ACR
- Make oidcIssuerUrl Bicep output conditional on OidcIssuerEnabled

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
WithWorkloadIdentity(false) disables both OIDC issuer and workload
identity. Defaults remain true for the happy path.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@github-actions
Copy link
Copy Markdown
Contributor

🎬 CLI E2E Test Recordings — 70 recordings uploaded (commit 95e3164)

View recordings
Test Recording
AddPackageInteractiveWhileAppHostRunningDetached ▶️ View Recording
AddPackageWhileAppHostRunningDetached ▶️ View Recording
AgentCommands_AllHelpOutputs_AreCorrect ▶️ View Recording
AgentInitCommand_DefaultSelection_InstallsSkillOnly ▶️ View Recording
AgentInitCommand_MigratesDeprecatedConfig ▶️ View Recording
AspireAddPackageVersionToDirectoryPackagesProps ▶️ View Recording
AspireUpdateRemovesAppHostPackageVersionFromDirectoryPackagesProps ▶️ View Recording
Banner_DisplayedOnFirstRun ▶️ View Recording
Banner_DisplayedWithExplicitFlag ▶️ View Recording
Banner_NotDisplayedWithNoLogoFlag ▶️ View Recording
CertificatesClean_RemovesCertificates ▶️ View Recording
CertificatesTrust_WithNoCert_CreatesAndTrustsCertificate ▶️ View Recording
CertificatesTrust_WithUntrustedCert_TrustsCertificate ▶️ View Recording
ConfigSetGet_CreatesNestedJsonFormat ▶️ View Recording
CreateAndRunAspireStarterProject ▶️ View Recording
CreateAndRunAspireStarterProjectWithBundle ▶️ View Recording
CreateAndRunEmptyAppHostProject ▶️ View Recording
CreateAndRunJavaEmptyAppHostProject ▶️ View Recording
CreateAndRunJsReactProject ▶️ View Recording
CreateAndRunPythonReactProject ▶️ View Recording
CreateAndRunTypeScriptEmptyAppHostProject ▶️ View Recording
CreateAndRunTypeScriptStarterProject ▶️ View Recording
CreateJavaAppHostWithViteApp ▶️ View Recording
CreateTypeScriptAppHostWithViteApp ▶️ View Recording
DashboardRunWithOtelTracesReturnsNoTraces ▶️ View Recording
DeployK8sBasicApiService ▶️ View Recording
DeployK8sWithGarnet ▶️ View Recording
DeployK8sWithMongoDB ▶️ View Recording
DeployK8sWithMySql ▶️ View Recording
DeployK8sWithPostgres ▶️ View Recording
DeployK8sWithRabbitMQ ▶️ View Recording
DeployK8sWithRedis ▶️ View Recording
DeployK8sWithSqlServer ▶️ View Recording
DeployK8sWithValkey ▶️ View Recording
DeployTypeScriptAppToKubernetes ▶️ View Recording
DescribeCommandResolvesReplicaNames ▶️ View Recording
DescribeCommandShowsRunningResources ▶️ View Recording
DetachFormatJsonProducesValidJson ▶️ View Recording
DetachFormatJsonProducesValidJsonWhenRestartingExistingInstance ▶️ View Recording
DoListStepsShowsPipelineSteps ▶️ View Recording
DoctorCommand_DetectsDeprecatedAgentConfig ▶️ View Recording
DoctorCommand_WithSslCertDir_ShowsTrusted ▶️ View Recording
DoctorCommand_WithoutSslCertDir_ShowsPartiallyTrusted ▶️ View Recording
GlobalMigration_HandlesCommentsAndTrailingCommas ▶️ View Recording
GlobalMigration_HandlesMalformedLegacyJson ▶️ View Recording
GlobalMigration_PreservesAllValueTypes ▶️ View Recording
GlobalMigration_SkipsWhenNewConfigExists ▶️ View Recording
GlobalSettings_MigratedFromLegacyFormat ▶️ View Recording
InitTypeScriptAppHost_AugmentsExistingViteRepoAtRoot ▶️ View Recording
InvalidAppHostPathWithComments_IsHealedOnRun ▶️ View Recording
LegacySettingsMigration_AdjustsRelativeAppHostPath ▶️ View Recording
LogsCommandShowsResourceLogs ▶️ View Recording
OtelLogsReturnsStructuredLogsFromStarterApp ▶️ View Recording
PsCommandListsRunningAppHost ▶️ View Recording
PsFormatJsonOutputsOnlyJsonToStdout ▶️ View Recording
PublishWithConfigureEnvFileUpdatesEnvOutput ▶️ View Recording
RestoreGeneratesSdkFiles ▶️ View Recording
RestoreRefreshesGeneratedSdkAfterAddingIntegration ▶️ View Recording
RestoreSupportsConfigOnlyHelperPackageAndCrossPackageTypes ▶️ View Recording
RunFromParentDirectory_UsesExistingConfigNearAppHost ▶️ View Recording
SecretCrudOnDotNetAppHost ▶️ View Recording
SecretCrudOnTypeScriptAppHost ▶️ View Recording
StagingChannel_ConfigureAndVerifySettings_ThenSwitchChannels ▶️ View Recording
StartAndWaitForTypeScriptSqlServerAppHostWithNativeAssets ▶️ View Recording
StopAllAppHostsFromAppHostDirectory ▶️ View Recording
StopAllAppHostsFromUnrelatedDirectory ▶️ View Recording
StopNonInteractiveMultipleAppHostsShowsError ▶️ View Recording
StopNonInteractiveSingleAppHost ▶️ View Recording
StopWithNoRunningAppHostExitsSuccessfully ▶️ View Recording
UnAwaitedChainsCompileWithAutoResolvePromises ▶️ View Recording

📹 Recordings uploaded automatically from CI run #24543813501

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants